TNO Hierarchical topic detection report at TDT 2004

نویسندگان

  • Dolf Trieschnigg
  • Wessel Kraaij
چکیده

Hierarchical topic detection is a new task in the TDT 2004 evaluation program, which aims to organize a collection of unstructured news data in a directed acyclic graph (DAG) structure, reflecting the topics discussed in the collection, ranging from rather coarse category like nodes to fine singular events. The HTD task poses interesting challenges since its evaluation metric is composed of a travel cost component reflecting the time to find the node of interest starting from the top node and a quality cost component, determined by the quality of the selected node. We present a scalable architecture for HTD and compare several alternative choices for agglomerative clustering and DAG optimization in order to minimize the HTD cost metric. The alternatives are evaluated on the TDT3 and TDT5 test collections.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Results of the 2003 Topic Detection and Tracking Evaluation

The National Institute of Standards and Technology (NIST) administered the sixth open evaluation of Topic Detection and Tracking (TDT) technologies in November of 2003. The TDT project supports development of technologies that automatically organize eventrelated news stories. The program leverages expertise in core technologies, Automatic Speech Recognition (ASR), Document Retrieval (DR), and M...

متن کامل

Hierarchical Topic Detection in TDT-2004

Huge volume of news makes it hard for people to keep up with the latest information, and automatic processing of news information becomes necessary. Topic Detection and Tracking is a research program that deals with this problem. From the observations in TDT, news topics can be described in different sizes, making it hard to define the “correct” granularity. In TDT-2004, the topic detection tas...

متن کامل

Using language models for tracking events of interest over time

This paper presents the TNO tracking system which was evaluated at the 2000 Topic Detection and Tracking evaluation project (TDT2000). The objective of the TDT tracking task is to track events of interest over time. We built a baseline tracking system based on a language modeling approach. This approach had proved to be powerful for the TREC adaptive filtering task and several other IR tasks.

متن کامل

Tdt-2004: Adaptive Topic Tracking at Maryland

A topic tracking system that combines elements from vector space and language modeling frameworks to compute document scores is described. The model is used for both the traditional TDT topic tracking evaluation design and the new supervised adaptive topic tracking evaluation. Results indicate that supervised adaptation and score normalization should be more closely coupled, and that current te...

متن کامل

Unsupervised Event Clustering in Multilingual News Streams

Abstract The Topic Detection and Tracking (TDT) benchmark evaluation project embraces a variety of technical challenges for information retrieval research. The TDT topic detection task is concerned with the unsupervised grouping of news stories according to the events they discuss. A detection system must both discover new events as the incoming stories are processed and associate incoming stor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004